59 research outputs found
Finding any Waldo: zero-shot invariant and efficient visual search
Visual search constitutes a ubiquitous challenge in natural vision, including daily tasks such as finding a friend in a crowd or searching for a car in a parking lot. Visual search must fulfill four key properties: selectivity (to distinguish the target from distractors in a cluttered scene), invariance (to localize the target despite changes in its rotation, scale, illumination, and even searching for generic object categories), speed (to efficiently localize the target without exhaustive sampling), and generalization (to search for any object, even ones that we have had minimal or no experience with). Here we propose a computational model that is directly inspired by neurophysiological recordings during visual search in macaque monkeys, which maps the discriminative power from object recognition models to the problem of visual search. The model takes two inputs, a target object, and a search image, and produces a sequence of fixations. The model consists of a deep convolutional network that extracts features about the target object, stores those features, and uses those features in a top-down fashion to modulate the responses to the search image, thus generating a task-dependent saliency map. We show that the model fulfills the critical properties outlined above, distinguishing it from heuristic approaches such as template matching, random search, sliding windows, bottom-up saliency maps and object detection algorithms. Furthermore, we directly compare the model against human eye movement behavior during three increasingly more complex tasks where subjects have to search for a target object in a multi-object array image, in natural scenes or in the well-known Waldo search task. We show that the model provides a reasonable first-order approximation to human behavior and can efficiently find targets in an invariant manner, without any training for the target objects
Integrating Curricula with Replays: Its Effects on Continual Learning
Humans engage in learning and reviewing processes with curricula when
acquiring new skills or knowledge. This human learning behavior has inspired
the integration of curricula with replay methods in continual learning agents.
The goal is to emulate the human learning process, thereby improving knowledge
retention and facilitating learning transfer. Existing replay methods in
continual learning agents involve the random selection and ordering of data
from previous tasks, which has shown to be effective. However, limited research
has explored the integration of different curricula with replay methods to
enhance continual learning. Our study takes initial steps in examining the
impact of integrating curricula with replay methods on continual learning in
three specific aspects: the interleaved frequency of replayed exemplars with
training data, the sequence in which exemplars are replayed, and the strategy
for selecting exemplars into the replay buffer. These aspects of curricula
design align with cognitive psychology principles and leverage the benefits of
interleaved practice during replays, easy-to-hard rehearsal, and exemplar
selection strategy involving exemplars from a uniform distribution of
difficulties. Based on our results, these three curricula effectively mitigated
catastrophic forgetting and enhanced positive knowledge transfer, demonstrating
the potential of curricula in advancing continual learning methodologies. Our
code and data are available:
https://github.com/ZhangLab-DeepNeuroCogLab/Integrating-Curricula-with-ReplaysComment: 8 pages, 6 figures, accepted in AAAI Summer Symposium Series
Proceeding
Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory
Working memory (WM), a fundamental cognitive process facilitating the
temporary storage, integration, manipulation, and retrieval of information,
plays a vital role in reasoning and decision-making tasks. Robust benchmark
datasets that capture the multifaceted nature of WM are crucial for the
effective development and evaluation of AI WM models. Here, we introduce a
comprehensive Working Memory (WorM) benchmark dataset for this purpose. WorM
comprises 10 tasks and a total of 1 million trials, assessing 4
functionalities, 3 domains, and 11 behavioral and neural characteristics of WM.
We jointly trained and tested state-of-the-art recurrent neural networks and
transformers on all these tasks. We also include human behavioral benchmarks as
an upper bound for comparison. Our results suggest that AI models replicate
some characteristics of WM in the brain, most notably primacy and recency
effects, and neural clusters and correlates specialized for different domains
and functionalities of WM. In the experiments, we also reveal some limitations
in existing models to approximate human behavior. This dataset serves as a
valuable resource for communities in cognitive psychology, neuroscience, and
AI, offering a standardized framework to compare and enhance WM models,
investigate WM's neural underpinnings, and develop WM models with human-like
capabilities. Our source code and data are available at
https://github.com/ZhangLab-DeepNeuroCogLab/WorM
Object-centric Learning with Cyclic Walks between Parts and Whole
Learning object-centric representations from complex natural environments
enables both humans and machines with reasoning abilities from low-level
perceptual features. To capture compositional entities of the scene, we
proposed cyclic walks between perceptual features extracted from CNN or
transformers and object entities. First, a slot-attention module interfaces
with these perceptual features and produces a finite set of slot
representations. These slots can bind to any object entities in the scene via
inter-slot competitions for attention. Next, we establish entity-feature
correspondence with cyclic walks along high transition probability based on
pairwise similarity between perceptual features (aka "parts") and slot-binded
object representations (aka "whole"). The whole is greater than its parts and
the parts constitute the whole. The part-whole interactions form cycle
consistencies, as supervisory signals, to train the slot-attention module. We
empirically demonstrate that the networks trained with our cyclic walks can
extract object-centric representations on seven image datasets in three
unsupervised learning tasks. In contrast to object-centric models attached with
a decoder for image or feature reconstructions, our cyclic walks provide strong
supervision signals, avoiding computation overheads and enhancing memory
efficiency
Finding any Waldo: zero-shot invariant and efficient visual search
Searching for a target object in a cluttered scene constitutes a fundamental
challenge in daily vision. Visual search must be selective enough to
discriminate the target from distractors, invariant to changes in the
appearance of the target, efficient to avoid exhaustive exploration of the
image, and must generalize to locate novel target objects with zero-shot
training. Previous work has focused on searching for perfect matches of a
target after extensive category-specific training. Here we show for the first
time that humans can efficiently and invariantly search for natural objects in
complex scenes. To gain insight into the mechanisms that guide visual search,
we propose a biologically inspired computational model that can locate targets
without exhaustive sampling and generalize to novel objects. The model provides
an approximation to the mechanisms integrating bottom-up and top-down signals
during search in natural scenes.Comment: Number of figures: 6 Number of supplementary figures: 1
On the Robustness, Generalization, and Forgetting of Shape-Texture Debiased Continual Learning
Tremendous progress has been made in continual learning to maintain good
performance on old tasks when learning new tasks by tackling the catastrophic
forgetting problem of neural networks. This paper advances continual learning
by further considering its out-of-distribution robustness, in response to the
vulnerability of continually trained models to distribution shifts (e.g., due
to data corruptions and domain shifts) in inference. To this end, we propose
shape-texture debiased continual learning. The key idea is to learn
generalizable and robust representations for each task with shape-texture
debiased training. In order to transform standard continual learning to
shape-texture debiased continual learning, we propose shape-texture debiased
data generation and online shape-texture debiased self-distillation.
Experiments on six datasets demonstrate the benefits of our approach in
improving generalization and robustness, as well as reducing forgetting. Our
analysis on the flatness of the loss landscape explains the advantages.
Moreover, our approach can be easily combined with new advanced architectures
such as vision transformer, and applied to more challenging scenarios such as
exemplar-free continual learning
- …